[TUTORIALS] Add multicta tutorial by lezcano · Pull Request #9654 · triton-lang/triton

lezcano · 2026-03-05T13:21:11Z

We go over all the bits and pieces necessary to write a multiCTA kernels
in Gluon. We finalise with a recipe to get SOTA perf on a dense matmul.

We also change the bench tool used in tutorial 8 (as cudagraph bench
does not zero out the L2) and the numbers from cublas as I was not
able to repro the numbers there (probably they have optimised cublas
in a newer version).

We go over all the bits and pieces necessary to write a multiCTA kernels in Gluon. We finalise with a recipe to get SOTA perf on a dense matmul. We also change the bench tool used in tutorial 8 (as cudagraph bench does not zero out the L2) and the numbers from cublas as I was not able to repro the numbers there (probably they have optimised cublas in a newer version).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32682077d6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

python/tutorials/gluon/14-multicta.py

Jokeren · 2026-03-06T01:02:16Z

python/tutorials/gluon/14-multicta.py

+
+
+if __name__ == "__main__" and not is_blackwell():
+    raise RuntimeError("This tutorial requires a Blackwell NVIDIA GPU")


hmm, why you still have the is_hopper_or_newer function then?

Jokeren · 2026-03-06T01:22:00Z

python/tutorials/gluon/14-multicta.py

+# `two_ctas=True`, because only the lead CTA waits before issuing the MMA.
+#
+# Once one `tcgen05_mma` in a kernel uses 2CTA mode, all of the `tcgen05_mma`
+# instructions in that kernel must use 2CTA mode.


why this requirement?

lezcano requested a review from ptillet as a code owner March 5, 2026 13:21

lezcano requested review from Mogball, ThomasRaoux and peterbell10 March 5, 2026 13:21

chatgpt-codex-connector bot reviewed Mar 5, 2026

View reviewed changes

python/tutorials/gluon/14-multicta.py Outdated Show resolved Hide resolved

support hopper for some tests

b9fd0e1

Jokeren reviewed Mar 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TUTORIALS] Add multicta tutorial#9654

[TUTORIALS] Add multicta tutorial#9654
lezcano wants to merge 2 commits intomainfrom
tutorial_multicta

lezcano commented Mar 5, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Jokeren Mar 6, 2026

Uh oh!

Jokeren Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		if __name__ == "__main__" and not is_blackwell():
		raise RuntimeError("This tutorial requires a Blackwell NVIDIA GPU")

Conversation

lezcano commented Mar 5, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Jokeren Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Jokeren Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants